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ABSTRACT 


An underlying assumption of Stackelberg Games (SGs) is perfect ra- 
tionality of the players. However, in real-life situations the followers 
(thieves, poachers, smugglers), as humans in general, may act not 
in a perfectly rational way, since their decisions may be affected by 
biases of various kinds which bound rationality of their decisions. 
One of the popular models of bounded rationality is Anchoring 
Theory (AT) which claims that humans have a tendency to flatten 
probabilities of available options, i.e. they perceive a distribution of 
these probabilities as being closer to the uniform distribution than 
it really is. We propose an efficient formulation of AT in sequential 
extensive-form SGs suitable for Mixed-Integer Linear Program so- 
lution methods and compare the results of its implementation in 
five state-of-the-art methods for solving sequential SGs. 
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1 INTRODUCTION 


Bounded rationality (BR) [15] in problem-solving refers to limi- 
tations of decision-makers that lead them to taking non-optimal 
actions. Except for limited cognitive abilities, BR can be attrib- 
uted to partial knowledge about the problem, limited resources, 
or imprecisely defined goal [1, 14]. The most popular models of 
BR are Prospect Theory [5], Anchoring Theory (AT) [16], Quantal 
Response [11] and Framing Effect [17]. In this paper, AT approach 
implemented in COBRA [12, 13] for normal-form games is extended 
to the case of sequential extensive-form games in a way that avoids 
non-linear constraints, which makes it suitable for a wide range 
of MILP approaches. AT assumes the existence of a certain distor- 
tion (towards the uniform distribution of probabilities of possible 
actions) of the follower’s perception of the leader’s mixed strategy. 
The leader being aware of that distortion can exploit this weakness 
in their strategy formulation. 

A pure strategy of the player is an assignment of one action to 
each potentially reachable state of the game. Let’s denote a set of all 
pure strategies of player i by Ij. A mixed strategy ôi is a probability 
distribution over II;. In extensive-form SGs each node in a game 
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tree is uniquely defined by a pair of sequences: the leader’s actions 
and the follower’s actions which lead to that node. These sequences 
will be denoted by o; and of, resp. 

The goal of SG is to find Strong Stackelberg Equilibrium [9] i.e. a 
strategy profile (ô7, be) satisfying the two following equations: 67 


arg maxs, uj (6), ô$) and be = arg maxs, uf (ô;, ôr). The second one 
defines the follower’s best (optimal) response to the leader’s strategy 
6, while the first one selects the best leaders’s strategy against the 
optimal follower’s response. Additionally it is assumed that the 
follower breaks ties in favour of the leader. uj,i € {l, f} is the 


utility/payoff of player i. 


2 AT IN SEQUENTIAL SG (ATSG) 


ATSG is implemented as a distorted follower’s perception of the 
leader’s behavior strategy. Let’s denote by q(i) a probability of 
choosing action i by the leader in a given information set (IS), stem- 
ming from its behavior strategy. The most straightforward imple- 
mentation of AT (though non-linear in sequence-form games) is to 
change the probability of taking this action to q’(i) = (1 — aq(i)) + 
a/M, where M is the number of actions available in this IS. How- 
ever, in sequence-form games, for a given leader’s feasible sequence 
of actions 07 = 41, 42, 43, . . 
on behavior strategy, would be p(o) = q(a1)q(a2) - - - q(an) and the 
distorted AT probability would become (*): p’(a7) = ((1 — æ)q(a1) + 
a/MıX(( = a)q(a2) + a/M2) +- (C1 = @)q(an) + a/Mn), where M; 
is the number of actions available in IS in which a; is played. 

State of the art approaches to SSE in extensive-form games uti- 
lize MILP formulations capable of exploiting a sequence form of 
a game [2, 3]. Variables p in MILP formulation of SG are prod- 
ucts of g(a;) values presented above (*), and as such cannot be 
expressed in a linear form with respect to q(aj). Consequently, 
applying the above AT modification to MILPwould end-up with 
non-linear constraints, inadequate for MILP formulation. Conse- 
quently, we propose to simplify the above ATSG by dropping 
distortion coefficients from all but the last one probabilities (**): 
p” (a) = qlar): -- qlan-1)((1 - @)qa,, +a/Mn) = qla1)-- - qlan-1} 
a/Mn+(1a)qla1):--qlan-1)qlan) = plinit(o1))a/Mn+(1-a)p(01), 
where init(-) is a function which outputs a sequence without the last 
move. A simplified version of ATSG (**) is well suited to MILP/LP 
formulations of sequence-form games. 

Please note that relations among probabilities of the leader’s 
actions within a single IS are the same according to both equa- 
tions (*) and (**), i.e. Vor, o; I(o}) = Iof) > p'(o/)/p'(o7) = 
p” h/p” (oF ), where p’(c), p” (0o) represent probability of sequ- 
ence o in a given IS calculated according to (*) and (**), resp. Fur- 
thermore, for a given sequence g}, for small values of a a difference 
lp” (c1) — p’(o7)| is also small. 
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Modification of MILP/LP based methods. ATSG formulation 
(**) was incorporated into three state-of-the-art MILP methods for 
general-sum sequential SGs: BC2015 [2], C2016 [3] and CBK2018 [4]. 
The first two are exact methods, the last one provides approximate 
solutions. In each case, due to inherent requirement of constraints 
linearity, a simplified AT version (**) was used, i.e. each occur- 
rence of p(o) in constraints enforcing the followers optimal re- 
sponse (either stand-alone or as a joint probability p(o}, of)) in 
the original MILP formulations of these methods was replaced by 
plinit(o)))a/Mn + (1 - a)p(op). 

Modification of heuristic methods. The remaining two meth- 
ods are heuristic non-MILP approaches to solving sequential exte- 
nsive-form SGs: O2UCT [7, 8] and EASG [18]. The former (double- 
oracle UCT sampling) relies on a guided sampling of the follower’s 
strategy space interleaved with finding a feasible leader’s strat- 
egy using double-oracle method. The latter utilizes Evolutionary 
Algorithm (EA) to find the leader’s mixed strategy. 

ATSG implementation in O2UCT required using distorted prob- 
abilities (**) in the follower’s oracle when calculating the expected 
value, as well as in a procedure that calculates a difference between 
the follower’s utilities for two strategies. 

Incorporation of ATSG into EASG relies on considering a dis- 
torted version (**) of the leader’s mixed strategy when calculating 
the best follower’s response against which each chromosome is 
evaluated. 

Observe that O2UCT and EASG are flexible in adoption of 
various ATSG formulations. For both methods, contrary to 
MILP/LP ATSG implementations, the potential existence of 
non-linearities in the formulas defining distorted follower’s 
probabilities is not harmful, and - in principle - any other 
BR modification could be used instead of eq. (**). For compa- 
rability reasons, we will use a linear form (**) in the experiments. 


3 EXPERIMENTAL EVALUATION 


In what follows modifications of considered methods incorporating 
ATSG will be referred to with the prefix AT-. 

Benchmark games. Experimental evaluation was performed 
on a set of patrolling Warehouse Games introduced in [6]. Game 
instances can be downloaded from our project website [10]. The 
benchmark set consisted of 25 games generated on 4 x 4 grid, T = 
3,...,7, albeit for T = 7 exact methods were unable to compute 
solutions with allotted time and memory. 

Experimental setup. For each game instance (game layout and 
game length) AT-O2UCT and AT-EASG were run 10 times and for 
each other (deterministic) MILP method a single trial was performed. 
Tests were run on Intel Xeon Silver 4116 @ 2.10GHz with 256GB 
RAM. Experiments with AT-O2UCT and AT-EASG were run in paral- 
lel, each with 8GB RAM assigned. Tests with AT-C2016, AT-CBK2018, 
AT-BC2015 were run sequentially with all 256GB RAM available 
in each trial. All tests were limited to 200 hours (per single test) 
and forcibly terminated if not completed within the allotted time. 
Results for all games are presented w.r.t the aggregated number 
of game nodes (|H|) in the extensive-form game representation. 
This grouping followed the formula: bucket = 107%" 4(0810 IHI), 
where round rounds a number to the nearest integer. Henceforth 
B;,i = 2,...,7 will denote the i-th bucket of games. 
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Payoffs. The average expected leader’s utilities are compared in 
Fig. 1a. AT-C2016 and AT-BC2015 are exact methods, so their results 
are clearly the highest and the resp. plots overlap. Both non-MILP 
heuristic methods perform slightly worse, although for games from 
Bes AT-EASG is a close runner-up, outperforming AT-O2UCT. 

For the largest B7 games the best-performing method is AT- 
O2UCT, which excels AT-EASG (the only remaining competitor) by 
a clear margin. None of the two exact MILP methods were capable 
of solving games of this size and the approximate MILP approach 
(AT-CBK2018) solved 16 game instances and failed in solving the 
remaining 9. Consequently, for the sake of fair comparison, payoff 
results of AT-CBK2018 are not presented for B7 games. Generally, 
AT-CBK2018 yields the weakest outcomes across the entire range 
of game sizes. 


b= 
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(a) Leader’s payoff. (b) Computation time.bib 
Figure 1: (a): The average expected leader’s payoff. (b): The 
average time requirements. 


Time scalability is presented in Fig. 1b. While all methods scale 
exponentially, the running times of non-MILP approaches grow at 
slower paces. For games from B>6 (AT-EASG) and from B>7 (AT- 
O2UCT), resp. they already excel exact MILP methods. Obviously, 
the main asset of AT-C2016 and AT-BC2015 is convergence to opti- 
mal solutions and hence a comparison of their running times with 
heuristic approaches needs to be considered with care. Nevertheless, 
it seems reasonable to conclude that beyond certain game complex- 
ity the exact methods become infeasible and, in such scenarios, 
both heuristic approaches present a viable alternative. 

The third MILP method is a state-of-the-art algorithm for approx- 
imate solving of extensive-form games. Following [4] AT-CBK2018 
was parameterized in a way which assures fast convergence (€ = 
0.3, 0 = 0.4), though still for the most complex B7 games AT-EASG 
and AT-O2UCT are faster (Fig. 1b), and at the same time provide bet- 
ter solutions (Fig. 1a). Note that AT-CBK2018 solved only 16 games 
from B7 and times for the remaining instances are capped at the 
limit of 200h. This situation favors AT-CBK2018, as for AT-O2UCT 
and AT-EASG the actual times for all games are reported. 

Results summary. Evaluation on a set of 25 games shows that 
non-MILP AT methods (O2UCT [7, 8], EASG [18]) provide optimal 
or close-to-optimal leader’s payoffs while being visibly faster than 
exact MILP AT approaches (BC2015 [2],C2016 [3]). At the same time, 
they outperform time-optimized approximate MILP AT method 
(CBK2018 [4]) in both payoffs quality and time efficiency. 
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